Predicting the unpredictable: a robust nomogram for predicting recurrence in patients with ampullary carcinoma

Objective To screen the risk factors affecting the recurrence risk of patients with ampullary carcinoma (AC)after radical resection, and then to construct a model for risk prediction based on Lasso-Cox regression and visualize it. Methods Clinical data were collected from 162 patients that received pancreaticoduodenectomy treatment in Hebei Provincial Cancer Hospital from January 2011 to January 2022. Lasso regression was used in the training group to screen the risk factors for recurrence. The Lasso-Cox regression and Random Survival Forest (RSF) models were compared using Delong test to determine the optimum model based on the risk factors. Finally, the selected model was validated using clinical data from the validation group. Results The patients were split into two groups, with a 7:3 ratio for training and validation. The variables screened by Lasso regression, such as CA19-9/GGT, AJCC 8th edition TNM staging, Lymph node invasion, Differentiation, Tumor size, CA19-9, Gender, GPR, PLR, Drinking history, and Complications, were used in modeling with the Lasso-Cox regression model (C-index = 0.845) and RSF model (C-index = 0.719) in the training group. According to the Delong test we chose the Lasso-Cox regression model (P = 0.019) and validated its performance with time-dependent receiver operating characteristics curves(tdROC), calibration curves, and decision curve analysis (DCA). The areas under the tdROC curves for 1, 3, and 5 years were 0.855, 0.888, and 0.924 in the training group and 0.841, 0.871, and 0.901 in the validation group, respectively. The calibration curves performed well, as well as the DCA showed higher net returns and a broader range of threshold probabilities using the predictive model. A nomogram visualization is used to display the results of the selected model. Conclusion The study established a nomogram based on the Lasso-Cox regression model for predicting recurrence in AC patients. Compared to a nomogram built via other methods, this one is more robust and accurate. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-024-11960-0.


Introduction
Ampullary carcinoma is a rare malignant tumor that starts in the small intestine at the junction of bile and pancreatic ducts [1].。Its incidence accounts for 0.2 -0.5% of digestive tract malignant tumors and 6 -17% of periampullary tumors [2].One of the primary treatment options for ampullary carcinoma (AC) is Pancreaticoduodenectomy (PD).The AC prognosis tends to be superior to that of other peripelvic tumors.The prognosis for AC, however, is not satisfactory after extensive research and follow-up, particularly because the overall survival of patients who experience a recurrence after PD is relatively poor [3].Therefore, it is crucial to identify and promptly treat individuals at high risk of recurrence to improve the prognosis of AC patients.
To date, there are few prognostic scoring systems available for AC, even though numerous research studies have integrated demographic and clinical data to build risk-scoring systems for gastrointestinal cancers [4][5][6].The American Joint Committee on Cancer (AJCC) 8th Tumor-Node-Metastasis (TNM) staging is widely used as a gold standard for evaluating patients' disease progression and prognosis in clinical practice.However, patients with similar scores in this system can have greatly differing prognostic survival rates.On the other hand, some researchers have attempted to assess patient prognosis at multiple levels, including genes and cytokines [7,8].However, there are obvious limitations, including more complicated technical requirements, expensive monitoring index equipment, and difficult accessibility in the clinic.Consequently, It is essential to create a reliable and accurate prediction model for AC patients.
Many studies have shown that the nomogram is a statistical model for individualized analysis of clinical events that can quantify the risk of clinical events through multiple factors and support the prevention and treatment of clinical events [9,10].Most nomogram parameter screening processes are based on univariate and multivariate analyses, which have limitations in dealing with multicollinearity between variables.Less widely utilized in the field of AC, lasso regression has the benefit of allowing for the construction of more accurate and robust models through the construction of a penalty function [11,12].However, few studies have utilized the Lasso-cox method for modeling in terms of the nomogram of AC.Based on clinicopathologic characteristics and prognostic factors, a combination of Lasso regression and Cox regression was utilized in this investigation.The former allows for effective screening of variables, while the latter allows for modeling and visualization for direct interpretation; subsequently, a more rigorous correlation validation was performed by a correlation validation group.
The purpose of this study is to construct a novel model based on the Lasso-Cox model for predicting the recurrence of AC to further comprehend patient disease characteristics and provide customized treatment plans.Meanwhile, it will be able to identify high-risk individuals and expose them to more thorough follow-up and surveillance, allowing clinicians to catch the return or advancement of the disease early on and take appropriate action.This will be an indispensable step towards precision medicine.

Patients enrolled
162 ampullary cancer patients who had experienced pancreaticoduodenectomy at Hebei Provincial Cancer Hospital between January 2011 and January 2022 were included in this study.This study was approved by the Medical Ethics Committee of Hebei Provincial Cancer Hospital.Inclusion criteria: (1) preoperative evaluation met the indication for radical surgical resection and radical pancreaticoduodenectomy was performed; (2) postoperative pathology diagnosed AC; patients were excluded if any of the following conditions were fulfilled: (1) The patient underwent preoperative adjuvant therapy; (2) patients died as a result of other illnesses or unexpected events; and (3) patients' clinical and follow-up data were not available.A prediction model was established in this study based on the clinicopathological characteristics of the patients, inflammatory markers, and the tumor's grading and staging.The included patients were then randomly split into a training group (N1 = 114) and a validation group (N2 = 48) in a ratio of 7:3, and the demographics, laboratory results, and prognosis of the patients in the two groups were compared to establish a more reliable and robust model.The clinicopathologic staging in this article was guided by the AJCC 8th edition criteria for AC [13].

Follow-up
The outpatient clinic, electronic contacts, and the Internet were used to keep monitor of all patients.Patients were regularly rechecked after surgery, and the rechecking items included imaging examinations and laboratory tests.The regular review was performed once a month in six months after surgery; once in three months if no signs of recurrence were seen; and once in six months for patients who had not had a recurrence for two years.Recurrence was defined as the appearance of new lesions in the vicinity of the original lesion or in other organs, and imaging reports (PET-CT, MRI) demonstrated contrast-enhanced images in certain areas.The time from diagnosis until the first recurrence or the final follow-up appointment was referred to as recurrence-free survival (RFS).Time from diagnosis to death or final follow-up was measured as overall survival (OS).

Statistical analysis
The best cutoff values for laboratory indicators and prognostic indicators were determined using ROC curve analysis; continuous variables were expressed as mean ± standard deviation or the median ± interquartile range (IQR); categorical variables were analyzed using the chi-square test for analysis of variance; Student's t-test and rank sum test for comparison of differences between groups.The recurrence-free survival curve was created using the Kaplan-Meier technique; Lasso regression was used to screen for the risk factors.Cox regression and random survival forest (RSF) were used for constructing two different types of prediction models using the parameters refined by Lasso regression.Delong test is used to compare the model performance.The hazard ratio (HR) value, which may be utilized as the risk score weight for assessing the prognosis, was calculated by the Cox proportional risk regression model following analysis of the prognostic variables.Using the variable importance (VIMP) approach, we determined the relative weights of each predictor in the RSF model [22].The discrimination and consistency of the model were assessed using time-dependent ROC, C-index, and calibration curves, respectively.The clinical applicability of the method was assessed using decision curve analysis.All statistical analyses were performed using R software v4.3.1 (R Foundation for Statistical Computing, Vienna, Austria; random Forest SRC, party, party kit, and VIM packages).

Characteristics of patients
During the study period, a total of 179 patients with ampullary tumor underwent radical pancreaticoduodenectomy, of which 2 patients were diagnosed with neuroendocrine tumors postoperatively, 3 patients died as a result of accidents, 1 patient died within 30 days of surgery, and 11 patients were excluded because of incomplete information, resulting in a total of 162 patients with AC being enrolled.Among them, 94 cases (58.02%) were male and 68 cases (41.98%) were female.Additionally, 58 cases (35.80%) of the patients had a history of drinking, 68 cases (41.98%) had underlying illnesses, 38 cases (23.46%) had preoperative jaundice reduction, and 139 cases (85.80%) had a history of perioperative blood transfusions.The gold standard for tumor clinical staging was the AJCC 8th edition staging.The "survminer" package was used to calculate optimal cutoffs for laboratory indicators, prognostic indicators, and inflammatory markers in R. Specific clinical data characteristics are detailed in the Table 1.

Prognosis-related data
The median follow-up time was 41 months (IQR = 21-59 months).81 out of 162 individuals experienced recurrence and 73 out of 162 patients passed away after the follow-up.The 1-, 3-, and 5-year OS and RFS were 86.8%, 60.1%, 43.8%, and 68.3%, 53.6%, 40.1%, respectively.Patients were randomly assigned in a 7:3 ratio to a training group (N1 = 114) and a validation group (N2 = 48).The study's variables included clinicopathological traits, tumor grade, stage, and pertinent inflammation-based markers.The optimal cutoff values for each prognostic indicator in the training group were calculated in R. The corresponding images were plotted in the Supplementary Figure .The optimum cutoff values for the inflammation-based markers were then utilized to plot the relevant K-M survival curves.(Supplementary Figs. 1, 2).It is necessary to develop a prognostic model for predicting recurrence to identify those who are at a high risk of recurrence and implement timely intervention.This information was obtained from the Kaplan-Meier curve between RFS and OS, which showed that OS was shorter in recurrence patients than in non-recurrence patients (Supplementary Fig. 3).The training group and the validation group performed correlation analysis, and none of the variables were statistically different between the two groups (P > 0.05) (Table 1).This demonstrates that randomized data grouping is logical and reasonable.

Performance comparison between prognostic models
Risk factors were screened out among all variables by Lasso regression, and changes in the coefficients of each variable were characterized as shown in Fig. 1A.Then, iterative analysis was performed using 10-fold crossvalidation, and a model with excellent performance was obtained when λ = 3.046 (Log λ=-1.70) (Fig. 1B).Among the screened variables were Gender, Drinking history, Complications, Tumor size, CA199, CA199/GGT, PLR, GPR, AJCC 8th edition TNM staging, differentiation, and Lymph node invasion.The Cox proportional risk regression model was further performed by the parameters screened by Lasso regression (Table 2).The C-index of the training set was calculated to be 0.845 and a forest plot was generated using Cox regression analysis (Fig. 2).
The RSF model was built in the training set utilizing the variables that were refined through Lasso regression (Fig. 1C).Through parameter debugging, the error rate of the model was stabilized when the ntree was 650, and the model's c-index was calculated to be 0.719.According to the VIMP method, the importance of the pertinent factors was ranked in the following order: CA19-9/GGT, AJCC 8th edition TNM staging, Lymph node invasion, Differentiation, Tumor size, CA19-9, Gender, GPR, PLR, Drinking history, and Complications (Fig. 1D).Since the C-index of the prediction model based on Lasso-Cox regression (C-index = 0.845) was higher than that of the prediction model based on random survival forest (C-index = 0.719).Meanwhile, the Lasso -Cox regression model is the optimal model (P = 0.019) according to the Delong test.Therefore, the prediction model established by Lasso-Cox regression was used in this study to predict the recurrence of patients with AC.

Lasso-Cox regression model for the training group: performance and clinical applicability
In comparison to the AJCC 8th TNM staging system alone (C-index = 0.746), the Lasso-Cox regression model (C-index = 0.845) demonstrated a higher C-index.The prediction model's calibration curves for 1, 3, and 5 years demonstrated significant concordance between expected and actual results (Fig. 3A-C).At 1, 3, and 5 years, the prediction model's area under the curve (AUC) had values of 0.855, 0.888, and 0.924, respectively (Fig. 4A).A novel assessment technique called decision curve analysis Fig. 1 Variables screened by Lasso regression (A).The variation characteristics of the coefficient of variables; (B) The 10-fold cross-validation method is used to select the optimal value of the parameter λ in the Lasso regression model (DCA) emphasizes the net clinical benefit of prediction models [23].Compared to the AJCC 8th TNM staging system, the nomogram produced higher net benefits over a wider range of threshold probabilities (Supplementary Fig. 4).

Validation of the lasso-cox regression model
The validation group was brought into the optimum cutoff values of the prognostic indicators from the training group, and the K-M survival curves of the crucial variables were plotted.The results revealed that most of the optimal cutoff values for variables among the risk factors identified by Lasso regression were statistically significant (Supplementary Fig. 5) (P < 0.1; Note: Due to the small sample size, P < 0.1 was regarded as statistically significant [24,25]).Patients from the validation group were then included in the RSF and Lasso-Cox regression models for performance comparison, respectively.Compared to the RSF model (C-index = 0.762), the C-index of the Lasso-Cox regression model (C-index = 0.867) remained higher.In comparison to the AJCC 8th TNM staging system alone, the Lasso-Cox regression model demonstrated a higher C-index (C-index = 0.758).Plotting the 1-year, 3-year, and 5-year calibration curves for the validation group revealed that the predicted and actual observations were in great concordance (Fig. 3D-F).The prediction model's tdAUC was 0.841, 0.871, and 0.901 at 1, 3, and 5 years, respectively (Fig. 4B).Comparing the nomogram to the AJCC 8th TNM staging system in DCA, the nomogram generated larger net benefits over a wider range of threshold probabilities (Supplementary Fig. 4).
In conclusion, the Lasso-Cox regression model was ultimately chosen to forecast the likelihood of recurrence in AC patients following radical surgery.We simplified the challenging mathematical model into a visual nomogram for straightforward clinical use (Fig. 5).The nomogram requires a summation of the scores of the variables it contains.The three lines reflecting the predicted RFS are then intersected by a vertical line at the total score.The individual's expected 1-, 3-, and 5-year RFS rates are the numbers that correspond to the intersection points.The optimal cutoff value was calculated based on the sum of the scores for each variable in the nomogram; the accuracy of the total score was evaluated based on the Cox regression model (C-index = 0.894); and the K-M survival curve was plotted based on the optimal cutoff value (Supplementary Fig. 6) (P < 0.001).

Discussion
Less research has been done on prognostic modeling for AC.To the best of our knowledge, this study is the first to incorporate inflammatory parameters and to further develop a unique predictive model using the Lasso-Cox approach.This model enables more precise and reliable identification of patients at high risk of developing 162 patients were split into training and validation groups in this study at a 7:3 ratio.11 clinically common and easily available variables, including CA199/ GGT, AJCC 8th edition TNM staging, Lymph node invasion, Differentiation, Tumor size, CA199, Gender, GPR, PLR, Drinking history, and Complications, were screened using Lasso regression in the training group.The screened risk variables were used to create the Lasso-Cox and RSF models.Additionally, the VIMP approach was used to rank the importance of 11 variables in the RSF model, and the RSF model's predictive capability was assessed.The prediction capabilities of the two models were compared to choose the most optimal one.In comparison to the Lasso-Cox model, the RSF model has a lower C-index (0.719 vs. 0.845).At the same time, the Lasso -Cox regression model is the optimal model (P = 0.019) according to the Delong test.Furthermore, Lasso regression outperforms univariate analysis in terms of addressing the issue of multicollinearity among variables.Therefore, the Lasso-Cox regression model was determined to be the best option.To assess the discriminative and accuracy performance of the model, tdROC, calibration curves, and DCA were performed.Subsequently, the validation group incorporated the Lasso -Cox model for validation.The validation group performed a second calculation of the C-index for the two models, and the results revealed that the Lasso-Cox model (C-index = 0.867) had a higher C-index than the RSF model (C-index = 0.762) and that the predicted results of the calibration curves for 1 year, 3 years, and 5 years were reasonably consistent with the actual results.The 1-year, 3-year, and 5-year tdAUC of the prediction models were 0.841, 0.871, and 0.901, respectively.Greater net returns were generated using the Lasso-Cox regression model in DCA over a wider range of threshold probabilities.Based on the Lasso-Cox model and the variable scores, a visual nomogram was constructed.The three straight lines denoting the anticipated probability of recurrence were crossed by a vertical line at the site of the appropriate total score.The expected 1-, 3-, and 5-year RFS values are represented at the junction points.
It is recognized that a poor prognosis in AC is related to drinking history, complications, tumor size, and CA19-9 [26][27][28]; The important criteria for assessing tumor progression and guiding patient follow-up is currently AJCC 8th edition TNM staging, lymph node invasion, and differentiation, which was currently uncontroversial and doesn't need more explanation [28,29].
The prognostic value of inflammatory parameters in cancer is somewhat controversial, and the mechanisms Fig. 5 Nomogram used to predict time-related recurrence in patients with AC.AC, Ampullary carcinoma are still unclear.The following inflammatory markers are now known to be associated with digestive malignancies and have been demonstrated in the literature: CA19-9/GGT [14], PLR [15], NLR [16], AAPR [17], GLR [18], GPR [19]and AGR [20].Inflammatory markers and prognosis survival in AC are correlated, according to an important amount of research on the subject [30].
Firstly, the most conventional host-tumor interaction in cancer patients is the systemic inflammatory response [31,32].All phases of tumor growth, including start, progression, malignant transformation, metastasis, and treatment resistance are all influenced by the inflammatory response [33].On the one hand, the majority of malignant tumors cause a transcriptional program to attract leukocytes, produce pro-oncogenic chemokines and cytokines, stimulate angiogenesis, and decrease albumin synthesis in the liver, which results in an intrinsic inflammatory response.On the other hand, the tumor microenvironment increases the levels of pro-inflammatory mediators and signaling molecules in the signaling pathways that drive angiogenesis and support tumor activity [30].As a result, there are more neutrophils and platelets in the blood, whereas there are fewer lymphocytes and higher amounts of albumin.As a result, a poor prognosis is slightly related to elevated inflammatory loads in cancer patients [34].
There are several potential limitations in this study.The first limitation of this study was that it was a single-center retrospective analysis using data from the same hospital and only including Chinese patients.However, differences in the level of care, clinical practice, and patient management styles among different healthcare institutions, as well as the influence of treatment modalities and preferences specific to this study center, may result in a limited representative sample.The results may not be generalizable to other countries or populations.Furthermore, this dataset contains a small sample size of surgically treated AC patients, which affects the generalization of the results as well as the external validity of the study.As a result, several multicenter datasets with high sample sizes are required for future validation.
A nomogram based on the Lasso-Cox regression model was constructed in this study to evaluate patient risk for postoperative recurrence.Compared to models created using other methods, the Lasso-Cox regression model is more reliable and accurate.Additionally, the construction of this nomogram has a unique reference value for physicians to visualize and analyze each patient's recurrence risk and make timely and accurate clinical decisions, which is extremely important for the identification of patients with a high risk of recurrence of AC and subsequent treatment.

Fig. 4 Fig. 3
Fig. 4 Time-dependent ROC of predicted 1-, 3-, and 5-year RFS based on Cox regression modeling in the training set and validation set.(A) training set; (B) validation set.RFS, recurrence-free survival

Table 2
Cox proportional hazards regression to predict recurrence based on Lasso regression Fig. 2 Forest plot based on Cox proportional hazards regression model in the training set